backpropagation-friendly eigendecomposition
Backpropagation-Friendly Eigendecomposition
Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.
Reviews: Backpropagation-Friendly Eigendecomposition
This is a point of confusion throughout the paper. In line 57, you start with given an input matrix M'. Listing the SVD / QR as a method to compute the the ED is not correct here. Later you clarify that M is considered to be a covariance matrix. You should clarify at the very beginning that the scope of your method is limited and restricted to symmetric matrices!
Reviews: Backpropagation-Friendly Eigendecomposition
There is mixed feedback in the reviews on the significance of the paper. In my view it is a fundamental building block in deep learning to be able to include modules that perform eigendecompositions in a stable and differentiable (via backprop) manner. While the idea of the paper is simple, i.e. combine the advantages of SVD and power iterations by using them in the forward and backward pass, respectively, there is real insight (see stability bounds) and solid empirical evidence (100% stability even in relatively high dimensions). I think this clarity - simple, but important idea, well analyzed - is a feature and makes this paper a valuable contribution.
Backpropagation-Friendly Eigendecomposition
Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.
Backpropagation-Friendly Eigendecomposition
Wang, Wei, Dang, Zheng, Hu, Yinlin, Fua, Pascal, Salzmann, Mathieu
Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them.
Backpropagation-Friendly Eigendecomposition
Wang, Wei, Dang, Zheng, Hu, Yinlin, Fua, Pascal, Salzmann, Mathieu
Eigendecomposition (ED) is widely used in deep networks. However, the backpropagation of its results tends to be numerically unstable, whether using ED directly or approximating it with the Power Iteration method, particularly when dealing with large matrices. While this can be mitigated by partitioning the data in small and arbitrary groups, doing so has no theoretical basis and makes its impossible to exploit the power of ED to the full. In this paper, we introduce a numerically stable and differentiable approach to leveraging eigenvectors in deep networks. It can handle large matrices without requiring to split them. We demonstrate the better robustness of our approach over standard ED and PI for ZCA whitening, an alternative to batch normalization, and for PCA denoising, which we introduce as a new normalization strategy for deep networks, aiming to further denoise the network's features.